Evaluating the Usefulness of Overview Visuali- 
zations for Users with Varying Levels of Domain 
Knowledge 

Susanne Bleisch*, Matt Duckham*, Jarod Lyon** 

* Department of Infrastructure Engineering, University of Melbourne, Aus- 
tralia, susanne.bleisch@unimelb.edu.au, matt@duckham.org 
** Arthur Rylah I nstitute, Melbourne, Australia, jarod.lyon@dse.vic.gov.au 



Abstract. Exploratory data analysis may be supported by suitable visuali- 
zations. This study aims to evaluate how useful overview visualizations are 
to generate hypotheses for further analysis. Focus groups with domain ex- 
perts and lay users participated in the evaluation of four different overview 
visualizations. The results show that all visualizations have their value but 
are of varying difficulty to interpret. Lay users were ableto learn about the 
data set and data collection process. Both domain experts and lay users 
could use the visualizations to develop ideas and hypotheses about the rep- 
resented data set. While experts were abl e to use their context knowl edge to 
report findings not picked up by lay users they also seemed somewhat re- 
stricted in openness and creativity as findings not related to domain 
knowl edge were generally doubted. 
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1. Introduction 

Suitable visualizations are assumed to support exploratory data analysis 
(EDA, Tukey 1977) where the users want to familiarize themselves with the 
data, search for i nteresti ng patterns or outliers, gain ideas for further inves- 
tigation, or formulate hypotheses for detailed analysis. The information 
seeking mantra (Shneiderman 1996) suggests starting with an overview 
before offering tools for filtering the data or examining details. While the 
information seeking mantra is often used as a guiding principle for visuali- 
zation design (Craft & Cairns 2005) there is scant literature regarding the 



importance and usefulness of using visual overviews for generating ideas 
and hypotheses. 

Spatio-temporal location based movement data is one type of available 
movement data (Andrienko et al. 2011). Here, we utilized location- based 
movement data from a radio-tagging network i n the M urray River, Austral- 
ia (Figure 1). In total more than 1000 fish were radio-tagged and 18 logging 
towers recorded when fish moved past them on a daily basis between 2006 
and 2011 Originally, the data set was collected for evaluating the influence 
of management interventions in specific river sections on the native fish 
population. However, the rich data set may additionally be ableto give fur- 
ther insight into fish behavior in the regulated Murray River. 




Figure 1 Schematic drawing of the monitored part of Murray River, Aus- 
tralia. 



This study aims to evaluate different overview visualizations in regard to 
their usefulness for generating ideas or hypotheses for further detailed data 
analysis. Tukey (1977) stresses the importance of domain or context 
knowledge for exploratory data analysis. We evaluated different overview 
visualizations with users i nterested in data analysis but possessing different 
levels of domain knowledge to analyze the influence of context knowledge 
on the usefulness. 



2. Methods 

2.1. Overviewvisualizations 

The data set was reviewed for its structure and four different overview visu- 
alizations were created. We define overview visualizations as representa- 
tions showing all thedatain a single display. Generally, highlighting specif- 
ic aspects or subsets of the data (in this case variables such as species or 
seasons) is avoided. The only assumption made is that fish movement is 
more relevant for data analysis than stationary fish. Different techniques 



for displaying all data points concurrently are used. However, overview vis- 
ualizations can also be achieved by using different aggregations based on 
the structure of the data (for example, summarizing all fish movements 
based on predetermined criteria). Two of the created visualizations are ad- 
aptations of overview visualizations already in use or suggested by the data 
owners. Two additional overview visualizations were devised based on the 
data characteristics. Basing the visualizations on the data characteristics 
ensures that the same visualization types could also be used for other loca- 
tion based movement data sets. 

2.2. Data collection 

The four different overview visualizations were discussed in three focus 
group sessions with participants having varying levels of domain 
knowledge. Focus groups are small groups moderated to ensure that specif- 
ic topics are discussed by the group (Grumbein & Lowe 20 10). One focus 
group consisted of three domain experts (river ecologists), who were famil- 
iar with the test data set. They are assumed to have the best possible 
knowledge of the data set that is to be analyzed. Two other focus groups 
consisted of four PhD students in each group who are interested in data 
analysis but have only I ay knowledge of fish behavior in rivers. 

The focus group sessi ons were moderated to ensure si mi I ar treatment of the 
different groups. All groups were first introduced to the data set and its col- 
lection and then to the four different visualizations of the data set. The in- 
troduction took about 10-12 minutes. The rest of the 60-minute sessions 
were used to discuss the visualizations and the data they show. Participants 
were encouraged to report what they see in the visualizations, to ask ques- 
tions and to generate hypotheses and ideas about how the would like to 
further explore the data set. Participants were asked for their preferences 
when the di scussi ons were ebbi ng away. 

With the participants' consent, video recording was used to capture state- 
ments and to allow for reconstruction of the knowledge built up through 
discussion and the use of the representations, i.e. including participants 
sketching on the paper printouts of the representations. 

2.3. Data analysis 

The video recordings were transcribed and the collected data was qualita- 
tively analyzed for insights (North 2006), insights being the unexpected in 
the data or the generated hypotheses for further exploration. As insights 
were rarely reported directly, the data was subsequently qualitatively ana- 
lyzed for its content according to an emerging coding scheme. The scheme 
comprises problems with the visualization, findings in a broader sense, 



questions asked, improvements suggested, and participant's preferences. 
The coded and summarized content was then compared between the differ- 
ent visualizations and between the focus groups with different levels of con- 
text knowledge to gain an understanding of the usefulness of different visu- 
al i zati on types and the i nf I uence of domai n knowl edge. 

The data was not analyzed quantitatively. Focus groups have the advantage 
that the participants can build upon each other's statements and the discus- 
sions are thus likely to go deeper and be more varied. However, statements 
or findings are often reported only once, as participants tend not to repeat. 
Participants may nod or give another indication of their agreement with 
statements of others but this is difficult to interpret as it could mean that 
they agree but have not thought of that aspect themselves or it could mean 
that they had similar thoughts. Thus, quantitative analysis of statements is 
difficult. 



3. Results 

Al I four overvi ew vi sual i zati ons have thei r val ue i n hel pi ng the parti ci pants 
of the focus groups to develop hypotheses and see specific aspects in the 
data. Also all focus groups pointed out problems with the visualizations and 
suggested improvements. I nsights or findings were rarely reported directly, 
more often they were phrased as questi ons or even as probl ems, i .e. report- 
ing not being able to see something could indicate the interpretation of the 
vi si bl e i n a certai n way. 

The most basic visualization (showing all fish over time and indicating the 
different river zones (space, cf. Figure 1) where a fish stayed on a specific 
day as color value) helped the lay users most in gaining a better under- 
standing of the data set and the data collection process. The domain experts 
used the basic visualization only briefly for confirming their knowledge of 
the data set by relating findings to specific aspects of the data collection 
process they knew about. All groups commented on the less than optimal 
use of color to convey spatial information. Overall, all overview visualiza- 
tions greatly helped the lay users gain a deeper understanding of the data 
set, measured as the total number of questions asked. The visualizations 
prompted the domain experts to discuss some novel aspects of the data set 
and the data collection process, and to reconsider some of the original hy- 
potheses. 

Generally, thelay users reported similar fi ndi ngs as the domai n experts did. 
Lay users often phrased findings as questions but were more open in ac- 
cepti ng a f i ndi ng as i nteresti ng and were creati ve i n suggesti ng ways of how 
findings could be analyzed further to, for example, prove the existence of a 



pattern seen in the visualizations. This may be related to their more general 
interest in data analysis and knowledge about data analysis techniques. The 
domain experts on the other hand often instantly related the findings to 
their context knowledge. Findings that did not match up previous 
knowl edge were phrased as questi ons and often doubted. H owever, i n a few 
instances the domain expert group pointed out interesting patterns, which 
they would I ike to explore further, that the lay user groups did not report. 

Statements of preferences for the different visualization types are difficult 
to interpret. Generally, all focus groups agreed that the visualizations are 
more valuable if more time is spent with them and that most cannot be un- 
derstood and analyzed at a first glance or rather are more useful after some 
discussion of what they show. All groups commented upon the underlying 
spatial representation of the river network (cf. Figure 1) in one of the visual- 
ization positively as it allows the instant localization of the different river 
zones. However, it was also mentioned that the easy look may be deceptive. 
All the other visualizations use a more abstract encoding of the spatial in- 
f ormati on through orderi ng or col or. 



4. Conclusion and Outlook 

The overview visualizations evaluated in this study are useful for domain 
experts and lay users. The latter can use them for gaining a better under- 
standing of the data set i n general but also for i dentifyi ng patterns and gen- 
erati ng i deas for further anal ysi s. I n most cases, domai n experts reverted to 
their context knowledge rather than perusing the visualizations and exam- 
ining their findings in more depth. The context knowledge may even over- 
ride some of the creativity or openness that is assumedly needed when try- 
ing to learn something new or additional from a well-known data set. 

Evaluating the same visualizations with a focus group of domain experts 
who do not have knowledge about the specific data set used but rather the 
domain will extend this study. Additionally, another focus group with peo- 
ple interested in data analysis but with more research experience than PhD 
students will be included. Those results should allow expanding on some of 
the findings reported here. 
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